[NVIDIA] feat: MiniMax M3 Day 0 support B300 by cquil11 · Pull Request #1724 · SemiAnalysisAI/InferenceX

cquil11 · 2026-06-12T19:50:55Z

MiniMax-M3 MXFP8 day-zero single-node vLLM sweep on B300.

New config minimaxm3-fp8-b300-vllm (.github/configs/nvidia-master.yaml) — TP8/TP4/TEP/DEP plus a tp2-ep2 entry across 1k1k and 8k1k (40 jobs).
New bench script benchmarks/single_node/fixed_seq_len/minimaxm3_fp8_b300.sh — --block-size 128 (MSA sparse attention), --language-model-only, conc-scaled cudagraph capture, MXFP8 checkpoint; serves from the launch_b300-nv.sh MODEL/MODEL_PATH split (unstaged model -> writable /data/models).
Image: dedicated vllm/vllm-openai:minimax-m3 (already the cu130 build; M3 support unmerged upstream — [Model] Add MiniMax M3 support vllm-project/vllm#45381).

Status: full sweep green (40/40 + 5/5 GSM8K, zero failures). Pareto: TP8 wins latency (~65 tok/s/user @ c4); TP4+EP4 wins 1k1k throughput (1909 tok/s/GPU @ c512); TP4 wins 8k1k (591 tok/s/GPU @ c128). Runs: canary, full.

🤖 Generated with Claude Code

Note

Low Risk
Additive benchmark config and shell script only; no changes to core inference, auth, or shared runtime beyond new CI sweep jobs.

Overview
Adds day-zero single-node throughput coverage for MiniMax-M3 (MiniMaxAI/MiniMax-M3-MXFP8) on B300 via a new minimaxm3-fp8-b300-vllm entry in nvidia-master.yaml, using the dedicated vllm/vllm-openai:minimax-m3 image and fixed-seq-len sweeps at 1k1k and 8k1k across TP/EP and data-parallel attention layouts.

Introduces benchmarks/single_node/fixed_seq_len/minimaxm3_fp8_b300.sh, which downloads the unstaged checkpoint to the B300 MODEL/MODEL_PATH layout, extends engine readiness timeout for large MXFP8 loads, and launches vLLM with mandatory --block-size 128, --language-model-only, and concurrency-scaled CUDA graph capture before running the standard serving benchmark (optional eval).

Documents the change in perf-changelog.yaml.

^{Reviewed by Cursor Bugbot for commit aff01bc. Bugbot is set up for automated code reviews on this repo. Configure here.}

MXFP8 single-node vLLM sweep (TP/TEP/DEP, incl. tp2-ep2) for MiniMax-M3 on B300. --block-size 128 (MSA sparse attention), --language-model-only for text-only throughput, dedicated vllm/vllm-openai:minimax-m3 image (vllm-project/vllm#45381). Serves from the launch_b300-nv.sh MODEL_PATH split (unstaged model -> writable /data/models). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions · 2026-06-12T19:51:04Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions · 2026-06-12T19:55:33Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27439474149
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27439474149

jasonlizhengjian · 2026-06-12T20:30:22Z

btw these comments also apply for B200 so please mirror there

Address PR #1724 review: TP8+EP8 conc-start 128->4 (1k1k and 8k1k) to probe whether TEP8 extends the min-latency frontier below plain TP8; TP4+EP4 conc-start 128->64 (1k1k) to fill the mid-curve. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Mirror PR #1724 review changes to B200: TP8+EP8 conc-start 128->4 (1k1k and 8k1k) to probe whether TEP8 extends the min-latency frontier below plain TP8; TP4+EP4 conc-start 128->64 (1k1k) to fill the mid-curve. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions · 2026-06-12T20:33:18Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27439481220
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27439481220

Lower conc-start 4->1 on the latency-probing layouts (tp8, tp8+ep8, tp4) for both 1k1k and 8k1k to capture single/dual-request min-latency points. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions · 2026-06-12T21:15:15Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27441382555
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27441382555

github-actions · 2026-06-12T23:23:23Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27443462979
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27443462979

functionstackx · 2026-06-12T23:45:25Z

/reuse-sweep-run

github-actions · 2026-06-12T23:46:03Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27449542635
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27449542635

Keep minimaxm3-fp8-b300-vllm single-node config from main (#1724) alongside GB200/GB300 dynamo full sweep configs. Preserve GLM5 and M2.5-FP4-B300-TRT perf-changelog entries from main.

* [NVIDIA] feat: MiniMax M3 Day 0 support B200 MXFP8 single-node vLLM sweep (TP/TEP/DEP) for MiniMax-M3 on B200. --block-size 128 (MSA sparse attention), --language-model-only for text-only throughput, dedicated vllm/vllm-openai:minimax-m3 image (vllm-project/vllm#45381). Adds the b200-dgxc runner-type group and a launch_b200-dgxc.sh MODEL_PATH case for the gharunner-staged weights. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * minimaxm3-fp8-b200-vllm: add perf-changelog entry Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * minimaxm3-fp8-b200-vllm: extend TEP8 to low conc for latency frontier Mirror PR #1724 review changes to B200: TP8+EP8 conc-start 128->4 (1k1k and 8k1k) to probe whether TEP8 extends the min-latency frontier below plain TP8; TP4+EP4 conc-start 128->64 (1k1k) to fill the mid-curve. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * minimaxm3-fp8-b200-vllm: add conc 1 and 2 to latency layouts Lower conc-start 4->1 on the latency-probing layouts (tp8, tp8+ep8, tp4) for both 1k1k and 8k1k to capture single/dual-request min-latency points. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>

cquil11 requested a review from a team June 12, 2026 19:50

cquil11 requested review from jgangani and kedarpotdar-nv as code owners June 12, 2026 19:50

github-project-automation Bot added this to InferenceMAX Board Jun 12, 2026

cquil11 added the full-sweep-fail-fast label Jun 12, 2026

cquil11 and others added 2 commits June 12, 2026 14:54

minimaxm3-fp8-b300-vllm: add perf-changelog entry

512a376

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Merge branch 'main' into feat/minimax-m3-b300

a9a6b98

jasonlizhengjian requested changes Jun 12, 2026

View reviewed changes

Comment thread .github/configs/nvidia-master.yaml Outdated

Comment thread .github/configs/nvidia-master.yaml Outdated

Comment thread .github/configs/nvidia-master.yaml Outdated

functionstackx mentioned this pull request Jun 12, 2026

[Klaud Cold][NVIDIA] feat: MiniMax M3 Day 0 support H200 #1728

Closed

minimaxm3-fp8-b300-vllm: add conc 1 and 2 to latency layouts

1ccf751

Lower conc-start 4->1 on the latency-probing layouts (tp8, tp8+ep8, tp4) for both 1k1k and 8k1k to capture single/dual-request min-latency points. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Merge remote-tracking branch 'origin/main' into pr-1724-reuse

aff01bc

functionstackx merged commit bfd2371 into main Jun 12, 2026
5 of 6 checks passed

functionstackx deleted the feat/minimax-m3-b300 branch June 12, 2026 23:45

github-project-automation Bot moved this to Done in InferenceMAX Board Jun 12, 2026

functionstackx mentioned this pull request Jun 13, 2026

[Klaud Cold] minimaxm3-fp8-b300-vllm-mtp: day-zero MiniMax-M3 EAGLE3 (MTP) B300 recipe #1733

Merged

cquil11 mentioned this pull request Jun 13, 2026

[NVIDIA] feat: MiniMax M3 Day 0 MTP (EAGLE3) support B300 #1737

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NVIDIA] feat: MiniMax M3 Day 0 support B300#1724

[NVIDIA] feat: MiniMax M3 Day 0 support B300#1724
functionstackx merged 6 commits into
mainfrom
feat/minimax-m3-b300

cquil11 commented Jun 12, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jasonlizhengjian commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

functionstackx commented Jun 12, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

cquil11 commented Jun 12, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jasonlizhengjian commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

functionstackx commented Jun 12, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cquil11 commented Jun 12, 2026 •

edited by cursor Bot

Loading